This analysis is inspired by this article.
First load the necessary packages. For pattern.nlp you need Python (version below 3.0) installed on your system with the pattern library. In my case I got a ‘No module named pattern.db’ error. I solved this by manually copying the downloaded pattern folder into the Python site-packages folder.
library(pattern.nlp)
library(DT)
library(plotly)
The data comes from an online survey among students of the course Smart Industry at the HAN. After the data has been read the column names are simplified.
rawData <- read.csv("survey.csv", stringsAsFactors = FALSE)
newNames <- c(
"Verwachtingen",
"Studiebelasting",
"Verbanden",
"Pluspunten",
"Activiteiten",
"Activeren",
"Bijdrage"
)
colnames(rawData) <- newNames
Now the data is ready to be analysed for sentiment. Two parameters are calculated:
## Function reading text vector. Returns sentiment data frame
read_sentiments <- function(vector, label) {
tempDF <- NULL
for (n in vector) {
x <- pattern_sentiment(n, language = "dutch")
x <- cbind(x, label)
tempDF <- rbind(tempDF, x)
}
return(tempDF)
}
sentDF <- NULL
for (m in 2:dim(rawData)[2]) {
x <- read_sentiments(rawData[,m], newNames[m-1])
sentDF <- rbind(sentDF, x)
}
Now the data can be visualised using the plotly package.
sentDF$id <- sapply(sentDF$id,function(x) gsub("\\. ","\\.<br>",as.character(x)))
sentDF$polarity <- round(sentDF$polarity, 2)
pal <- rainbow(dim(rawData)[2]-1)
f <- list(
family = "Courier New, monospace",
size = 14,
color = "#7f7f7f"
)
x <- list(
title = "Polariteit (neg / pos)",
titlefont = f
)
y <- list(
title = "Subjectiviteit",
titlefont = f
)
plot_ly(sentDF, x = sentDF$polarity, y = sentDF$subjectivity, text = paste("Tekst: ", sentDF$id), hoverinfo = "text", mode = "markers", marker = list(size = 20), color = sentDF$label, colors = pal) %>%
layout(autosize = F, width = 1000, height = 800, xaxis = x, yaxis = y)
datatable(sentDF[, c("label", "id", "polarity")], filter = 'top', options = list(
pageLength = 25, autoWidth = TRUE
))